AITopics | markov kernel

Collaborating Authors

markov kernel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Unifying Framework for Unsupervised Concept Extraction

Squires, Chandler, Ravikumar, Pradeep

arXiv.org Machine LearningApr-29-2026

Techniques for concept extraction, such as sparse autoencoders and transcoders, aim to extract high-level symbolic concepts from low-level nonsymbolic representations. When these extracted concepts are used for downstream tasks such as model steering and unlearning, it is essential to understand their guarantees, or lack thereof. In this work, we present a unified theoretical framework for unsupervised concept extraction, in which we frame the task of concept extraction as identifying a generative model. We present a general meta-theorem for identifiability, which reduces the problem of establishing identifiability guarantees to the problem of characterizing the intersection of two sets. As we demonstrate on a range of widely-used approaches, this meta-theorem substantially simplifies the task of proving such guarantees, thus paving the way for the development of new, principled approaches for concept extraction.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2604.24936

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Complete Causal Identification from Ancestral Graphs under Selection Bias

Chen, Leihao, Mooij, Joris M.

arXiv.org Machine LearningMar-30-2026

Many causal discovery algorithms, including the celebrated FCI algorithm, output a Partial Ancestral Graph (PAG). PAGs serve as an abstract graphical representation of the underlying causal structure, modeled by directed acyclic graphs with latent and selection variables. This paper develops a characterization of the set of extended-type conditional independence relations that are invariant across all causal models represented by a PAG. This theory allows us to formulate a general measure-theoretic version of Pearl's causal calculus and a sound and complete identification algorithm for PAGs under selection bias. Our results also apply when PAGs are learned by certain algorithms that integrate observational data with experimental data and incorporate background knowledge.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Machine Learning

2603.26301

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.13)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(4 more...)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

Notes on Forré's Notion of Conditional Independence and Causal Calculus for Continuous Variables

Chen, Leihao

arXiv.org Machine LearningMar-26-2026

Recently, Forré (arXiv:2104.11547, 2021) introduced transitional conditional independence, a notion of conditional independence that provides a unified framework for both random and non-stochastic variables. The original paper establishes a strong global Markov property connecting transitional conditional independencies with suitable graphical separation criteria for directed mixed graphs with input nodes (iDMGs), together with a version of causal calculus for iDMGs in a general measure-theoretic setting. These notes aim to further illustrate the motivations behind this framework and its connections to the literature, highlight certain subtlies in the general measure-theoretic causal calculus, and extend the "one-line" formulation of the ID algorithm of Richardson et al. (Ann. Statist. 51(1):334--361, 2023) to the general measure-theoretic setting.

artificial intelligence, conditional independence, independence, (16 more...)

arXiv.org Machine Learning

2603.24333

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Louisiana > East Baton Rouge Parish > Baton Rouge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.91)

Add feedback

Unfolding with a Wasserstein Loss

Craig, Katy, Faktor, Benjamin, Nachman, Benjamin

arXiv.org Machine LearningMar-26-2026

Data unfolding -- the removal of noise or artifacts from measurements -- is a fundamental task across the experimental sciences. Of particular interest in the present work are applications of data unfolding in physics, in which context the dominant approach is RichardsonLucy (RL) deconvolution. The classical RL approach aims to find denoised data that, once passed through the noise model, is as close as possible to the measured data, in terms of Kullback-Leibler (KL) divergence. Fundamental to this approach is the hypothesis that the support of the measured data overlaps with the output of the noise model, so that the KL divergence correctly captures their similarity. In practice, this hypothesis is typically enforced by binning the measured data and noise model, introducing numerical error into the unfolding process. As a counterpoint to classical binned methods for unfolding, the present work studies an alternative formulation of the unfolding problem, using a Wasserstein loss instead of the KL divergence to quantify the similarity between the measured data and the output of the noise model. We establish sharp conditions for existence and uniqueness of optimizers; as a consequence we answer open questions of Li, et al. [23], regarding necessary conditions for existence and uniqueness in the case of transport map noise models. Following these theoretical results, we then develop a provably convergent generalized Sinkhorn algorithm to compute approximate optimizers. Our algorithm requires only empirical observations of the noise model and measured data and scales with the size of the data, rather than the ambient dimension.

artificial intelligence, machine learning, minimizer, (19 more...)

arXiv.org Machine Learning

2603.20903

Genre: Research Report (0.83)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

A Non-asymptotic Analysis for Learning and Applying a Preconditioner in MCMC

Hird, Max, Maire, Florian, Negrea, Jeffrey

arXiv.org Machine LearningFeb-12-2026

Preconditioning is a common method applied to modify Markov chain Monte Carlo algorithms with the goal of making them more efficient. In practice it is often extremely effective, even when the preconditioner is learned from the chain. We analyse and compare the finite-time computational costs of schemes which learn a preconditioner based on the target covariance or the expected Hessian of the target potential with that of a corresponding scheme that does not use preconditioning. We apply our results to the Unadjusted Langevin Algorithm (ULA) for an appropriately regular target, establishing non-asymptotic guarantees for preconditioned ULA which learns its preconditioner. Our results are also applied to the unadjusted underdamped Langevin algorithm in the supplementary material. To do so, we establish non-asymptotic guarantees on the time taken to collect $N$ approximately independent samples from the target for schemes that learn their preconditioners under the assumption that the underlying Markov chain satisfies a contraction condition in the Wasserstein-2 distance. This approximate independence condition, that we formalize, allows us to bridge the non-asymptotic bounds of modern MCMC theory and classical heuristics of effective sample size and mixing time, and is needed to amortise the costs of learning a preconditioner across the many samples it will be used to produce.

artificial intelligence, machine learning, preconditioner, (16 more...)

arXiv.org Machine Learning

2602.10714

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

21c86d5b10cdc28664ccdadf0a29065a-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 21:15:39 GMT

algorithm, ex 2, mcmc, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Add feedback

Local-Global MCMC kernels: the best of both worlds

Neural Information Processing SystemsOct-2-2025, 22:28:12 GMT

Recent works leveraging learning to enhance sampling have shown promising results, in particular by designing effective non-local moves and global proposals.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Invariant Representations via Wasserstein Correlation Maximization

Eikenberry, Keenan, Liu, Lizuo, Lee, Yoonsang

arXiv.org Machine LearningMay-20-2025

This work investigates the use of Wasserstein correlation -- a normalized measure of statistical dependence based on the Wasserstein distance between a joint distribution and the product of its marginals -- for unsupervised representation learning. Unlike, for example, contrastive methods, which naturally cluster classes in the latent space, we find that an (auto)encoder trained to maximize Wasserstein correlation between the input and encoded distributions instead acts as a compressor, reducing dimensionality while approximately preserving the topological and geometric properties of the input distribution. More strikingly, we show that Wasserstein correlation maximization can be used to arrive at an (auto)encoder -- either trained from scratch, or else one that extends a frozen, pretrained model -- that is approximately invariant to a chosen augmentation, or collection of augmentations, and that still approximately preserves the structural properties of the non-augmented input distribution. To do this, we first define the notion of an augmented encoder using the machinery of Markov-Wasserstein kernels. When the maximization objective is then applied to the augmented encoder, as opposed to the underlying, deterministic encoder, the resulting model exhibits the desired invariance properties. Finally, besides our experimental results, which show that even simple feedforward networks can be imbued with invariants or can, alternatively, be used to impart invariants to pretrained models under this training process, we additionally establish various theoretical results for optimal transport-based dependence measures. Code is available at https://github.com/keenan-eikenberry/wasserstein_correlation_maximization .

artificial intelligence, encoder, machine learning, (15 more...)

arXiv.org Machine Learning

2505.11702

Country: